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Abstract 

Kohn-Sham density functional theory is one of the most widely used electronic 
structure theories. In the pseudopotential framework, uniform discretization 
of the Kohn-Sham Hamiltonian generally results in a large number of basis 
functions per atom in order to resolve the rapid oscillations of the Kohn-Sham 
orbitals around the nuclei. Previous attempts to reduce the number of basis 
functions per atom include the usage of atomic orbitals and similar objects, but 
the atomic orbitals generally require fine tuning in order to reach high accuracy. 
We present a novel discretization scheme that adaptively and systematically 
builds the rapid oscillations of the Kohn-Sham orbitals around the nuclei as well 
as environmental effects into the basis functions. The resulting basis functions 
are localized in the real space, and are discontinuous in the global domain. The 
continuous Kohn-Sham orbitals and the electron density are evaluated from the 
discontinuous basis functions using the discontinuous Galerkin (DG) framework. 
Our method is implemented in parallel and the current implementation is able to 
handle systems with at least thousands of atoms. Numerical examples indicate 
that our method can reach very high accuracy (less than ImeV) with a very 
small number (4 ^ 40) of basis functions per atom. 
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1. Introduction 



Electronic structure theory describes the energies and distributions of elec- 
trons, and is essential in characterizing the microscopic structures of molecules 
and materials in condensed phases. Among all the different formalisms of elec- 
tronic structure theory, Kohn-Sham density functional theory (KSDFT) [2| 
achieves so far the best compromise between accuracy and efficiency, and has 
become the most widely used electronic structure model for condensed matter 
systems. Kohn-Sham density functional theory gives rise to a nonlinear eigen- 
value problern, which is commonly solved using the self-consistent field itera- 
tion method [Sl . In each iteration, the Kohn-Sham Hamiltonian is constructed 
from a trial electron density and is discretized into a finite dimensional matrix. 
The electron density is then obtained from the low- lying eigcnf unctions, called 
Kohn-Sham orbitals, of the discretized Hamiltonian. The resulting electron 
density and the trial electron density are then mixed and form a new trial elec- 
tron density. The loop continues until self-consistency of the electron density 
is reached. An efficient algorithm therefore contains three phases: discretiza- 
tion of the Hamiltonian; evaluation of the electron density from the discretized 
Hamiltonian; and self-consistent iteration. In this paper, we focus on the dis- 
cretization of the Hamiltonian and the evaluation of the electron density in the 
pseudopotential framework 0]. 

If space is uniformly discretized, the Kohn-Sham Hamiltonian generally re- 
quires a basis set with a large number of degrees of freedom per atom. For 
most chemical systems, the kinetic energy cutoff typically ranges from 15Ry 
to 90Ry for standard planewave discretization in the norm-conserving pseu- 
dopotential framework 1J|, which amounts to about 500 ^ 5000 basis functions 
per atom. The required number of basis functions per atom is even larger for 
uniform discretization methods other than planewaves, such as finite difference 
method [1, @] and finite element method 043 ■ 

The large number of basis functions per atom originates from the rapid os- 
cillation of the Kohn-Sham orbitals. The Kohn-Sham orbitals oscillate rapidly 
around the nuclei and become smooth in the interstitial region of the nuclei. 
Physical intuition suggests that the rapid oscillations around the nuclei are in- 
ert to changes in the environment. A significant part of the rapid oscillations 
can already be captured by the orbitals associated with isolated atoms. These 
orbitals are called atomic orbitals. Numerical methods based on atomic orbitals 
or similar ideas have been designed based on this observation Envi- 
ronmental effect is not built into the atomic orbitals directly, but can only be 
approximated by fine tuning the adjustable parameters in these atomic orbitals. 
The values of the adjustable parameters therefore vary among different chemical 
elements and exchange-correlation potentials, and sometimes vary among the 
different ambient environment of atoms. The quality of the atomic orbitals are 
difficult to be improved systematically, but relies heavily on the experience of 
the underlying chemical system. 

Atomic orbitals and uniform discretization methods can be combined, as 
in the mixed basis methods lB"21|. The quality of the basis functions can 
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therefore be systematically improved by incorporating the uniform discretization 
methods. However, fine tuning the adjustable parameters is still necessary due 
to the absence of the environmental effect in the basis functions, and in certain 
circumstances the number of basis functions per atom is still large. 

In this paper we propose a novel discretization method to build the environ- 
mental effects into the basis set to achieve further dimension reduction of the 
basis set. The basis functions are constructed adaptivcly and seamlessly from 
the atomic configuration in local domains, called elements. The basis functions 
are discontinuous at the boundary of the elements, and they form the basis 
set used in the discontinuous Galerkin (DG) framework. The flexibility of the 
DG framework allows us to employ these discontinuous basis functions to ap- 
proximate the continuous Kohn-Sliam orbitals, and allows us to achieve high 
accuracy (less than ImeV) in the total energy calculation with a very small 
number (4 40) of basis functions per atom.. Our method is implemented in 
parallel with a rather general data communication framework, and the current 
implementation is able to calculate the total energy for systems consisting of 
thousands of atoms. 

The discontinuous Galerkin framework has been widely used in numerical 
solutions of partial differential equations (PDE) for more than four decades, see 
for example 22l427l| and the references therein. One of the main advantages 
of the DG method is its flexibility in the choice of the basis functions. The 
idea of constructing basis functions adaptively from the local environment has 
also been explored in other circumstances in numerical analysis such as reduced 



basis method |28l - l31| and multi-scale discontinuous Galerkin method |32l - l34| 
for solving PDE. In the current context, we apply the DG algorithm to solve 
eigenvalue problems with oscillatory eigenfunctions, and the basis functions are 
constructed by solving auxiliary local problems numerically. 

The paper is organized as follows. Section [2] introduces the discontinuous 
Galerkin framework for Kohn-Sham density functional theory. The construc- 
tion of the adaptive local basis functions is introduced in Section [31 Section U] 
discusses implementation issues in more detail. The performance of our method 
is reported in Section [5l followed by the discussion and conclusion in Section [HI 



2. Discontinuous Galerkin framework for Kohn-Sham density func- 
tional theory 

2.1. Brief introduction of KSDFT 

The Kohn-Sham energy functional in the pseudopotential framework Q is 
given by: 



EKsiHh}) = ^ ^ / I VV'd' dx + / V^^tpdx + J2leJ2 j 

1 /■/■ p{x)p{y) 



\x ~ y\ 



da;dy+ / exc[p(a:)] dx, (1) 
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where p{x) = (x) and the {V'il's satisfy the orthonormal constraints: 

i^*^jdx = 6,,. (2) 

In ([1]), we have taken the Kleinman-Bylander form of the pseudopotential (35| . 
The pseudopotential is given by 

Vps = Voxt +y^^je\be){be\. 

e 

For each £, bi is a function supported locally in the real space around the posi- 
tion of one of the atoms , 7^ = +1 or —1, and we have used the Dirac bra-ket 
notation. We have ignored the smn degeneracy and have adopted the local den- 
sity approximation (LDA) [36, 37 1 for the exchange-correlation functional. The 



proposed method can also be used for more complicated exchange-correlation 
functionals and when spin degeneracy is involved. 

The Kohn-Sham equation, or the Euler-Lagrange equation associated with 
(dl reads 

Hes[p]^^ = (-iA + Ves[p] +J2-fe\b£){bi\)^p^ = E,i;„ (3) 

e 

where the effective one-body potential Vctt is given by 

Kff [p] (x) = Fext ix)+ f dy + e'^, [p{x)\. (4) 

Note that ^ is a nonlinear eigenvalue problem, as VcS depends on p, which is 
in turn determined by {ipi}- The electron density is self-consistent if both (O 
and (lU) are satisfied. After obtaining the self-consistent electron density, the 
total energy of the system can be expressed using the eigenvalues {Ei\ and p 
as 13 

E,,,^Y.E,-\ 11 e^^}Mdxdy + J e^Mx)]dx~J e'MxMx)dx. (5) 

The goal of Kohn-Sham density functional theory is to calculate the total energy 
Etot and the self-consistent electron density p given the atomic configuration. 

Numerical algorithms for Kohn-Sham density functional theory can be broadly 
divided into two categories: One may try to directly minimize the energy func- 
tional ([ij with respect to the Kohn-Sham orbitals {ipi} (see e.g. [38[); one 
may also try to look for a solution for ([3]), usually by using the self-consistent 
iteration. 

The self-consistent iteration goes as follows. Starting with an initial guess 
po, one looks for a solution of ^ iteratively: 

1. Discretization of the Hamiltonian: Determine the effective Hamiltonian 
Etcft[Pn] from the input density at the n-th step p„; 
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2. Evaluation of the electron density: Obtain p = from the effective 
Hamiltonian Heff[pn]; 

3. Self-consistent iteration: Determine the input density at the (n + l)-th 
step Pn+i from p„ and p, for instance: 

Pn+i = apn + (1 - a)p 

with some parameter a. 

4. If \\pn — p\\ < (5, stop; otherwise, go to step (1) with n <— n + 1. 

Remark. The mixing step above is called linear mixing in the literature, which 
is the simplest choice. More advanced mixing schemes [s^ can be used as 
well. The mixing scheme used in our current implementation is the Anderson 
mixing scheme [39| , but we will not go into the details of mixing schemes in this 
work. 

In this paper we focus on the discretization of the Hamiltonian and the 
evaluation of the electron density. Given an effective potential T4ff, we find p 
from 

N 

p(x) = ^|^,|'(^), (6) 

i=l 

where the {ijjiYs arc the first N eigcnfunctions of iJcff- 

H,sil>^ = (-iA + Kff + ^ lt\bi){ht\)^, - E,i,,. (7) 

Note that the {V'il's minimize the variational problem 

w N 
£^off({V'.})= 2E / IVV-.G^)!' dx+ / V,^{x)p{x)Ax + Y,-f,Y,\{bi.^:)\\ 

(8) 

with the orthonormality constraints {i^ii^ipj) ~ Sij. 

The evaluation of the electron density is clearly the main bottleneck in the 
self-consistent iteration, which is the focus of the numerical algorithms for Kohn- 
Sham density functional theory. We consider efficient and accurate discretiza- 
tion for the evaluation of the electron density in this work. 

2.2. Discontinuous Galerkin method for KSDFT 

The discontinuous Galerkin (DG) methods have been developed for different 
types of partial differential equations [22-27|. One of the main advantages of the 
DG method is its flexibility in the choice of the approximation space, as the DG 
method does not require the continuity condition of the basis functions across 
the interfaces of the elements. This flexibility is important for constructing 
effective discretization schemes for Kohn-Sham density functional theory. 

We present in the following a DG method for the evaluation of the electron 
density. Among the different formalisms in the DG framework, we will use 
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the interior penalty method [2^ |2^. The interior penalty method naturally 
generalizes the variational principle (|8]). 

Wc denote by O the computational domain with the periodic boundary con- 
dition, which corresponds to F point sampling in the Brillouin zone, fl is also 
referred to as the global domain in the following discussion. Bloch boundary 
conditions can be taken into account as well, and this will appear in future 
publications. Let T be a collection of quasi-uniform rectangular partitions of Q 
(see Fig. [2] for an example with four elements): 

T= {Ei,E2,--- .Em}, (9) 

and S be the collection of surfaces that correspond to T- Each Ek is called an 
element of O. For a typical choice of partitions used in practice, the elements are 
chosen to be of the same size. For example, for a crystalline material, elements 
can be chosen as integer multiples of the conventional cell of the underlying 
lattice. As a result, unlike the usual finite element analysis, the element size 
will remain the same. 

In the following discussion, we use extensively the inner products defined as 
below 



(v,w)^= / V* {x)w{x) dx, (10) 
Je 

{v,w)g~ / V* (x) ■ w{x) ds{x), (11) 
Js 

M 

{v,w)^ ^J2(^^^)e.^ (12) 
1=1 



ses 



In the discontinuous Galerkin method (the interior penalty method) , the discrete 
energy functional corresponding to ([S]) is given by 



N N 



i=l i=l 

N N 

+ ?E(W]'W]). + E^^EK^^'^^)r^ (14) 

1=1 i i=l 

Here the last term comes from the non-local terms in Eq. ([5]), and {{ • }} and 
[[ • ]] are the average and the jump operators across surfaces, defined as follows. 
For S G S° the set of interior surfaces, we assume S is shared by elements Ki 
and K2. Denote by ni and 712 the unit normal vectors on S pointing exterior 



^In the language of finite element method, we will not use the /i-refinement. 
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to Ki and K2, respectively. With Ui = u\QKi, i = 1,2, we set 

[[u]] = uiTii + it27i2 on S. (15) 
For 5 e 5'^ where is the union of the surfaces on the boundary, we set 

[[u]] = un on S, (16) 
where n is the outward unit normal. For vector-valued function g, we define 

{{«}} = 5(91 +92) on 5 £5°, (17) 

where qi = q\dK , and 

{{q}}=g on 5 £5^. (18) 

Note that in the current context S — S° since we assume periodic boundary 
condition for the computational domain, and every surface is an interior surface. 
The constant a in (|14p is a positive penalty parameter, which penalizes the 
jumps of functions across element surfaces to guarantee stability. The choice of 
a will be further discussed in Section [S] 

Assume that we have chosen for each element iSj. a set of basis functions 
{Vfe.j }/=ii where Jk is the number of basis functions in E^. We extend each tpk^j 
to the whole computational domain VL by setting it to be on the complement 
set of Ek- Define the function space V as 

V = span{<^fcj, Ek £ T, .? = 1, • • • , Jk]- (19) 

We minimize ([H]) for {V'z} C V. The energy functional in the approxi- 
mation space V leads to the following eigenvalue problem for {V'ilfLi- For any 
u e V, 

\ (v^', v^.)^ - \ { M , {{ vv^.}}), - 1 {{{vv}}, m + J ( , m ), 

+ {v, Vcsipi)j- + ^11 {v, bi)j- {bi, = Xi {v, ipi)j- . (20) 
e 

Setting V = (fik'.j' and 

Jk 

^C'^k.jVk.j, (21) 
we arrive at the following linear system 

1 a 

- 2 ({{^'/'fe'J'}}' [['/'fe.^]] + X ( 1'^'=' J']] ' [["^fej]] + {Vk',3':VoS'Pk,3)r 

+ ^11 {Vk',j',bi)j. {bi, ipk^j)rjCt;k^j = K {ipk' , ipk,j) c,.k,j- (22) 
i ^ k,j 
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We define A to be the matrix with entries given by the expression in the paren- 
theses, B to be the matrix with entries {(pk'j'^'fikj), and Ci to be the vector 
with components {ci-k,j)k,j -i we have the following simple form of generalized 
eigenvalue problem 

Aci = XiBci 

ior i = 1,2, . . . , N. Following the standard terminologies in the finite element 
method, we call A the (DG) stiffness matrix, and B the (DG) mass matrix. In 
the special case when the DG mass matrix B is equal to the identity matrix, 
we have a standard eigenvalue problem Aci = XiCi. Once {ci} are available, the 
electron density is calculated by 



N 



EE 



Jk 



(23) 



3. Basis functions adapted to the local environment 

The proposed framework in the last section is valid for any choice of basis 
functions. To improve the efficiency of the algorithm, it is desirable to use 
less number of basis functions while maintaining the same accuracy. In order to 
achieve this goal, the choice of the functions {(pkj} is important. In this section, 
we discuss a way to construct the basis functions {ifkj} that arc adapted to the 
local environment. 

The starting point is the following observation. The Kohn-Sham orbitals 
{ipi} exhibit oscillatory behavior around the nuclei. In a full electron calculation, 
the nuclei charge density is the summation of delta functions located at the 
positions of the nuclei (or numerical delta function after discretization) and 
the Kohn-Sham orbitals have cusp points at the positions of the atoms. In 
the pseudopotential framework which involves only valence electrons, one can 
still see that the Kohn-Sham orbitals and the electron density are much more 
oscillatory near the atom cores than in the interstitial region, as illustrated in 
Fig. [T] In the setting of real space method or planewave method, in order to 
resolve the Kohn-Sham orbitals around the atom cores where the derivatives of 
Kohn-Sham orbitals become large one has to use a uniform fine mesh. Therefore, 
the number of mesh points becomes huge even for a small system. This makes 
the electronic structure calculation expensive. 

In order to reduce the cost, we note that the Kohn-Sham orbitals are smooth 
away from the atoms and the uniform fine discretization is not efficient enough. 
Adaptive refinement techniques can be used to improve the efficiency by reduc- 
ing the number of basis functions per atoms. Techniques of this type include 



finite element based adaptive mesh refinement method [41| , finite volume based 
adaptive mesh refinement method, and multiresolution basis functions [42l - l44l | . 
to name a few. Our approach builds the oscillatory behavior the Kohn-Sham 
orbitals near the atom cores into the basis functions. Hence, a small number of 
basis functions are enough to characterize the Kohn-Sham orbitals. This idea is 
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not entirely new. For example, the philosophy of pseudopotential techniques is 
quite similar, though the reduction is done at the analytic level. On the side of 
numerical methods, the ideas behind atomic orbital basis or numerical atomic 



The main difference from the previous approaches is that instead of prede- 
termining basis functions based on the information from isolated atoms, our 
approach builds the information from the local environment into the basis func- 
tions as well. Thanks to the flexibility of the discontinuous Galerkin framework, 
this can be done in a seamless and systematic way. The basis functions form a 
complete basis set in the global domain il. The basis set is therefore efficient, 
and at the same time the accuracy can be improved systematically. This is an 
important difference between this approach and the previous methods along the 
same line. 

The basis functions {fkj} are determined as follows. Given the partition 
T and the effective potential Ves, let us focus on the construction of {ifkj}, 
J = 1, • • • , Jfc for one element Ek G T- As discussed above, our approach is to 
adapt {ifikj} to the local environment in Ek- 

For each element Ek, we take a region Qk ^ Ek- Qk is called the extended 
element associated with the element Ek- The set Qk\Ek is called the buffer 
area. We will choose Qk which extends symmetrically along the ±x(?/, z) direc- 
tions from the boundary of Ek ■ The length of the buffer area extended beyond 
the boundary of Ek along the ±x(y, z) direction is called the "buffer size along 
the x{y, z) direction". We restrict the effective Hamiltonian on Qk by assuming 
the periodic boundary condition on dQk and denote by HeS^Q^. the restricted 
Hamiltonian. Hefi,Qk is discretized and diagonalized, and the corresponding 
eigenfunctions are denoted by {^kj}, indexed in increasing order of the associ- 
ated eigenvalues. We restrict the first Jk eigenfunctions {^k,j} from Qk to Ek, 
denoted by {(pkj}- Each ipk.j is therefore defined locally on Ek- As discussed 
before we extend each ipk.j to the global domain f2 by setting the value to be 
on the complement of Ek- The resulting functions, still denoted by {^kj} are 
called the adaptive local basis functions. Numerical result suggests that we can 
take very small Jk to achieve chemical accuracy. 

The reason why we choose the periodic boundary condition on Qk for the 
restriction HcR^q^. is twofold. On one hand, the periodic boundary condition 
captures better the bulk behavior of the system (than the Dirichlet boundary 
condition for example); On the other hand, the periodic boundary condition 
makes the solution of HoS,q^. more easily adapted to existing DFT algorithms 
and packages, as most of them can treat periodic boundary conditions. Other 
choices such as the Neumann boundary condition are possible, and the optimal 
choice of boundary conditions remains to be an open question. 

The basis functions constructed from the buffer region capture well the local 
singular behavior of Kohn-Sham orbitals near the nuclei. Hence, the approxi- 
mation space formed by {(pk.j} gives an efficient and accurate discretization to 
the problem, as will be illustrated by numerical examples in Section [5] Note 
that the {tpk.jYs are the eigenfunctions of the self-adjoint operator Hcs.q^ on 
Qk, and therefore form a complete basis set on Qk when Jk ^ oo. This implies 
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that after restriction, the functions {ipk,j} also form a complete basis set on 
Ek as Jk ^ oo. The accuracy can therefore be systematically improved in the 
electronic structure calculation. 

Eq. (|22p proposes a generalized eigenvalue problem. From numerical point 
of view it would be more efficient if we can choose {^flkj} such that the DG 
mass matrix is an identity matrix and that Eq. ([^^ becomes a standard eigen- 
value problem. Moreover, as Jk increases, the basis functions {(pk,j} can become 
degenerate or nearly degenerate, which increases the condition number. Both 
problems can be solved at the same time by applying a singular value decom- 
position (SVD) filtering step, resulting in an orthonormal basis set {ifikj}'- 

1. For each k, form a matrix Mk = (tySfc.i, ^k,2, • ■ • , fk.Jk) with (pk,j] 

2. Calculate SVD decomposition UDV* = Mk, 

D = diag(Afc4, Afc,2, • ■ • , Afc,jJ, 
where \k,j are singular values of Mk ordered decreasingly in magnitude; 



K 7 



> S and 



•^fe,./fc+i 



<5{Jk^ Jk 



3. For a threshold S, find Jk such that 
if all singular values are larger than the threshold). Take Uj be the j-th 
column of [/, 7 = 1, • • • , J^; 

4. Set Jk ^ Jk and (pkj ^ Ukj for j = 1, • ■ • , J^. 

Remark. Although the threshold S can avoid numerical degeneracy of the basis 
functions, the numerical degeneracy is not observed for the cases studied in 
section[S] In other words, we will take 5 = 0, Jk = Jk- 

After constructing the basis functions {tpk.j}, we then apply the discontin- 
uous Galcrkin framework to solve {ipi} and hence p corresponding to Hc«- We 
summarize the overall algorithm as follows: 

1. Set n = 0, let T be a partition of into elements, and po be an initial 
trial electron density; 

2. Form the effective potential Vos [pn] and the effective Hamiltonian iJcff [pn] ; 

3. For each element Ek G T, calculate the eigenfunctions corresponding to 
the Hamiltonian Hcs.q^ on the extended element Qk, and obtain the or- 
thonormal adaptive local basis functions {vfc.j}; 

4. Solve (|22p to obtain the coefficients {ci-^k.j} for the Kohn-Sham orbitals 
and reconstruct the electron density p by (|23)) : 

5. Mixing step: Determine Pn+i from p„ and p. If \\pn — p\\ < S, stop; 
otherwise, go to step (2) with 7i ^ n + 1. 

We remark that due to the flexibility of the DG framework one can supple- 
ment the functions {(fikj} constructed above by other functions in Ef^, such as 
local polynomials in Ek , Gaussian functions restricted to Ek , and other effective 
basis functions based on physical and chemical intuition. From practical point 
of view, we find that the adaptive basis set constructed above already achieves 
satisfactory performance. 
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4. Implementation details 

This section explains the implementation details for the above algorithm. 
Specialists of the DG methods can skip this section and go directly to the 
numerical results in Section [5l This section is mostly written for the readers 
who are less familiar with the DG implementation. 

Grids and Interpolation 
The above algorithm involves three types of domains: the global domain 
il, the extended elements {Qk}, and the elements {Ek}. Quantities defined on 
these domains are discretized with different types of grids. 

• On O, the quantities such as p and V^jfr are discretized with a uniform 
Cartesian grid with a spacing fine enough to capture the singularities and 
oscillations in these quantities. 

• The grid on Qk is simply the restriction of the uniform grid of fl on Qk- 
This is due to the consideration that all quantities on Qk are treated as 
periodic and hence a uniform grid is the natural choice. 

• The grid on Ek is a three-dimensional Cartesian Legendre-Gauss-Lobatto 
(LGL) grid in order to accurately carry out the operations of the basis 
functions {(fk.j} such as numerical integration and trace operator for each 
element Ek- 

Transferring various quantities between these three grids requires the following 
interpolation operators. 

• 51 to Qk- This is used when we restrict the density p„ and the effective 
potential Ves to the extended element Qk- Since the grid on Qk is the 
restriction of the grid on ft, this interpolation operator simply copies the 
required values. 

• Qk to Ek- This is used when one restricts {^Pkj} and their derivatives to 
Ek- As the grid on Qk is uniform, the interpolation is done by Fourier 
transform. Due to the fact that both grids are Cartesian, the interpolation 
can be carried out dimension by dimension, which greatly improves the 
efficiency. 

• Ek to il. This is used when one assembles the Kohn-Sham orbitals {ipi} 
from the coefficients {ci-kj} of the elements. The interpolation from the 
LGL grid to the uniform grid is done by Lagrange interpolation, again 
carried out dimension by dimension. Averaging is performed for the grid 
points of CI shared by multiple elements. 

The non-local pseudopotentials are used both in solving {tpk.j} on each Qk 
and in the numerical integration step on the LGL grid of each Ek- In our 
implementation, the non-local pseudopotentials are directly generated in real 
space on Qk and on Ek without further interpolation between the grids. 
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4-2. Implementation of the discontinuous Galerkin method 

Wc use plancwaves in each extended element Qk to discretize the local effec- 
tive Hamiltonian ffeff.o,- and the LOBPCG algorithm [4^ with the precondi- 
tioner proposed in |46| to diagonalize the discretized Hamiltonian. The resulting 
eigenfunetions {<^fcj"}j=i of Hcs,Qk are restricted to Ek and interpolated onto 
its LGL grid. Within the SVD filtering step, the inner product that we adopt 
is the discrete weighted I2 product with the LGL weights inside Ek- The main 
advantage of the SVD filtering step is that the discontinuous Galerkin method 
results in a standard eigenvalue problem. 

The assembly of the DG stiffness matrix follows ([22]) and consists of the 
following steps. 

• For the first term i {V^Pk' ,i' , '^'Pk,j)j- and the fifth term {(fik'j' , Vcsfk,j)-j-, 
their contributions are non-zero only when k = k' since otherwise two ba- 
sis functions have disjoint support. Hence, for each fixed fc, we compute 
C^fkj'-, ^Vk.j)^^ and {(pk,j' 1 Vcffipkj) The integration is done numeri- 
cally using the LGL grid on Ek ■ Part of the stiffness matrix corresponding 
to these two terms clearly has a block diagonal form. 

• For the second, third, and fourth terms of (P^ . one needs to restrict basis 
functions and their derivatives to element faces. As the one-dimensional 
LGL grid contains the endpoints of its defining interval, this is done simply 
by restricting the values of the three-dimensional LGL grid to the element 
faces. One then calculates these three terms using numerical integration 
on these resulting two-dimensional LGL grids. Since the integral is non- 
zero only when Ek and Ek' are the same element or share a common face, 
part of the stiffness matrix corresponding to these three terms is again 
sparse. 

• The last term of (|22p is J2e^i {fk',j',bt)j- {bi,ipk,j)'j-- The integration is 
again approximated using the LGL grids of the elements. Notice that the 
contribution is non-zero iff ipk',j' and (fk^j overlap with the support of a 
common bi. Since each 6^ is localized around a fixed atom, Lpk^j and ^Pk' ,j' 
need to be sufficiently close for this term to be non-zero. As a result, part 
of the stiffness matrix corresponding to this last term is also sparse. 

Though the DG stiffness matrix A is sparse, this property is not yet exploited 
in the current implementation. The eigenvalues and eigenvectors of the DG 
stiffness matrix are calculated using the pdsyevd routine of ScaLAPACK by 
treating it as a dense matrix. We plan to replace it with more sophisticated 
solvers that leverage the sparsity of A in future. 

4.3. Parallelization 

Our algorithm is implemented fully in parallel for message-passing environ- 
ment. To simplify the discussion, we assume that the number of processors is 
equal to the number of elements. It is then convenient to index the processors 
{Pk} with the same index k used for the elements. In the more general setting 
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where the number of elements is larger than the number of processors, each pro- 
cessor takes a couple of elements and the following discussion will apply with 
only minor modification. Each processor Pk locally stores the basis functions 
Wk,j} for j ~ 1, 2, . . . , Jfe and the unknowns {ci-k,j} for i = 1, 2, . . . , and 
j = 1, 2, . . . , Jfe. We further partition the non-local pseudopotentials {bi} by as- 
signing bi to the processor Pk if and only if the atom associated to be is located 
in the element Ek- 

The eigenfunctions of the local Hamiltonian Hcs^q^ are calculated on each 
processor P^. In order to build the local Hamiltonian Hcs^q^, the processor Pk 
needs to access all the non-local pseudopotentials of which the associated atoms 
are located in Qk- This can be achieved by communication among Ek and its 
nearby elements. Once these pseudopotentials are available locally, the eigen- 
functions of Hca,Qk &re computed in parallel without any extra communication 
between the processors. 

The parallel implementation of the DG solve is more complicated: 

• For the calculation of the first and the fifth terms of the DG stiffness 
matrix A in Eq. (P^ . each processor Pk performs numerical integration 
on Ek- Since the local basis functions {^k,j} are only non-zero on Ek, this 
step is carried out fully in parallel. 

• To calculate the second, third, and fourth terms, each processor Pk com- 
putes the surface integrals restricted to the left, front, and bottom faces 
of Ek- This requires the basis functions of the left, front, and bottom 
neighboring elements. 

• To calculate the sixth term, each processor Pk computes the parts associ- 
ated with the non-local pseudopotentials {bi] located on Pk- This requires 
the access to the basis functions of all elements that overlap with bi. 

To summarize, each processor Pk needs to access the basis functions from its 
neighboring elements and from the elements that overlap with the support set 
of the non-local pseudopotentials located on the elements associated with Pk- 
Due to the locality of the non-local pseudopotentials, these elements arc geo- 
metrically close to Pk- Since the size of the elements is generally equal to or 
larger than one unit cell, the support set of the non-local pseudopotentials are 
generally within the range of the neighboring elements. Therefore, the number 
of the non-local basis functions required by Pk is bounded by a small constant 
times the typical number of the basis functions in an element. 

The use of the pdsyevd routine of ScaLAPACK for solving the eigenvalue 
problem results in another source of communication. ScaLAPACK requires 
A to be stored in its block cyclic form and this form is quite different from the 
distribution in which the DG stiffness matrix is assembled (as mentioned above) . 
As a result, one needs to redistribute A into this block cyclic form before calling 
pdsyevd and then redistribute the eigenfunctions afterwards. 

In order to support these two sources of data communication, we have im- 
plemented a rather general communication framework that only requires the 
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programmer to specify the desired non-local data. This framework then au- 
tomatically fetches the data from the processors that store them locally. The 
actual communication is mostly done using asynchronous communication rou- 
tines MPIJsend and MPLIrecv. 

5. Numerical examples 

In order to illustrate how our method works in practice, we present numerical 
results for the ground state electronic structure calculation, using sodium (Na) 
and silicon (Si) as representative examples for metallic and insulating systems, 
respectively. Wc find that very high accuracy (less than 10~^ au per atom) 
can be achieved by using only a small number of adaptive local basis functions. 
Because of the small number of basis functions per atom, the DG scheme already 
exhibits significant speedup in computational time for a small system containing 
128 Na atoms. We demonstrate that the current implementation is able to solve 
systems with thousands of atoms, and that the algorithm has a potential to be 
applied to much larger systems with a more advanced implementation. 

This section is organized as follows: section \EA] introduces the setup of the 
test systems and how the error is quantified. Section 15.21 applies the adaptive 
local basis functions to disordered quasi-lD sodium and silicon system, followed 
by the result for the disordered quasi-2D and bulk 3D systems in section 15.31 
Wc discuss the effect of the penalty parameter a in section 15.41 Finally we 
demonstrate the computational performance of our parallel implementation of 
the adaptive local basis functions in section [5.51 

5.1. Setup 

We use the local density approximation (LDA) [s^ [s^] for the exchange- 
correlation functional, and Hartwigsen-Goedecker-Hutter (HGH) pseudopoten- 
tial (47! with the local and non-local pseudopotential fully implemented in the 
real space [il]. All quantities are reported in atomic units (au). All calculations 
are carried out on the Hopper system maintained at National Energy Research 
Scientific Computing Center (NERSC). Each compute node on Hopper has 24 
processors (cores) with 32 gigabyte (GB) of memory (1.33 GB per core). 

The performance of the adaptive local basis functions are tested using Na 
and Si as representative examples for simple metallic and insulating systems, 
respectively. The crystalline Na has a body centered cubic (bcc) unit cell, with 
2 atoms per cell and a lattice constant of 7.994 au. The crystalline Si has a dia- 
mond cubic unit cell, with 8 atoms per cell and a lattice constant of 10.261 au. 
Each atomic configuration in the following tests is obtained by forming a su- 
percell consisting m x n x p unit cells with perfect crystal structure, and a 
random displacement uniformly distributed in [—0.2,0.2] au is then applied to 
each Cartesian coordinate of each atom in the supercell. The resulting atomic 
configuration is therefore mildly disordered in order to avoid the possible can- 
cellation of errors for the case of perfect crystalline systems. A system is called 
quasi-lD ii 1 ~ m = n < p, quasi-2D ii I = m < n = p, and 3D bulk if 
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1 < m = n = p, respectively. In all the tests below, the element is chosen to 
be the (conventional) unit ceU of the lattice. Fig. [2] shows how a quasi-lD Na 
system with 8 atoms extended along the z direction are partitioned in order to 
generate adaptive local basis functions. The global domain is partitioned into 
4 elements {Ek}'^^^ with 2 atoms per element. The red area represents one of 
the elements E2, and the corresponding extended element Q2 consists of both 
the red area and the blue area (buffer). We recall that the buffer size along 
the x{y, z) direction refers to the length of the buffer area extended beyond the 
boundary of the element along the x{y, z) direction. The unit of buffer size 
is the lattice constant for the perfect crystalline system. Fig. [2] shows the case 
with the buffer size of 0.50 along the z direction, and 0.0 along the x and y 
directions. 

We quantify the error of the adaptive local basis functions the error of the 
total energy per atom which is defined as follows. First, the electronic struc- 
ture problem is solved by using planewaves on the global domain starting from 
a random initial guess of the electron wavefunctions. The total energy after 
reaching self-consistency is denoted by Eg lb- Then, the same electronic struc- 
ture problem is solved by the DG formulation starting from a random initial 
guess of the adaptive local basis functions on each element. The total energy af- 
ter reaching self-consistency is denoted by Edg- The global domain calculation 
and the DG calculation using adaptive local basis functions are therefore com- 
pletely independent, and the error of the total energy per atom is defined to be 
\Eglb — Edg \ /Natom- For simplicity only F point is used in the Brillouin zone 
sampling. The proposed method can be easily generalized to fc-point sampling. 
10 LOBPCG iterations are used in each SCF iteration for the global domain 
calculation, and 3 LOBPCG iterations are used in each SCF iteration for gener- 
ating the adaptive local basis functions in the DG calculation. A small number 
of LOBPCG iterations is already sufficient, since the electron wavefunctions in 
the global domain calculation and the adaptive local basis functions in the DG 
calculations at the end of each SCF iteration can be reused as the initial guess 
in the consequent SCF iteration for continuous refinement. Anderson mixing is 
used for the SCF iteration with a fictitious electron temperature set to be 2000 
K to facilitate the convergence of the SCF iteration. 

The grid spacing for the global domain calculation is 0.4 au for Na and 0.32 
au for Si. This translates to a grid of size 20 x 20 x 20 to discretize one Na 
unit cell and a grid of size 32 x 32 x 32 to discretize one Si unit cell. The 
Legendre-Gauss-Lobatto (LGL) grid for each element is 20 x 20 x 20 for Na and 
40 x 40 X 40 for Si. The LGL grid is only used for the purpose of numerical 
integration in the assembly process of the DG matrix. We remark that this grid 
is denser than what is commonly used for the electronic structure calculation 
for three reasons: 1) The HGH pseudopotential used in the present calculation 
is more stiff than many other pseudopotentials such as the Troullier-Martins 
pseudopotential 2) The potentials and wavefunctions are represented in the 
real space rather than in the Fourier space; 3) Most importantly, a dense grid in 
the real space is needed in both global domain calculations and DG calculations 
in order to reliably refiect the error of the total energy per atom. 
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We remarked in the end of section [3] that the DG framework is very flexible 
and can incorporate not only the adaptive local basis functions but also other 
basis functions such as local polynomials. In practice we find that the adap- 
tive local basis functions are computationally more efficient than polynomials. 
Therefore in the following discussion only adaptive local functions will be used 
in the basis set. The number of adaptive local functions per atom is also referred 
to as the degrees of freedom (DOF) per atom. 

5.2. Disordered Quasi-ID System 

The error of the total energy per atom with respect to different buffer sizes 
and different numbers of basis functions per atom (DOF per atom) is illustrated 
for the disordered quasi-lD sodium system in Fig. [3] (a) and for the disordered 
quasi-lD silicon system in Fig. [3](b). The penalty parameter a is 20. In both 
cases, the error decreases systematically when the buffer size and the number 
of basis functions per atom increase. For Na, the error of the total energy per 
atom is already below 10"'^ au using as few as 4 basis functions per atom with 
a small buffer of size 0.50 (black diamond with solid line). When the buffer size 
is increased to 1.00 (blue star with dashed line), the error of the total energy 
per atom is 4.3 x 10"^ au or 0.01 mcV using 10 adaptive local basis functions 
per atom. 

Similar behavior is found for the silicon system. For a small buffer of size 
0.50 (black diamond with solid line), the error of the total energy per atom is 
2.3 X 10"''' au with 6 basis fimctions per atom. For the buffer of size 1.00 (blue 
star with dashed line), the error of the total energy per atom is 7.8 x 10~^ au or 
0.002 meV using as few as 8 basis functions per atom. Physical intuition suggests 
that the minimum number of basis functions is 4, which reflects one 2s and three 
2p atomic orbitals. 20 40 number of basis functions per atom is generally 
required to achieve good accuracy if Gaussian type orbitals or numerical atomic 
orbitals are to be used [l3]- Therefore for the quasi-lD systems tested here, our 
algorithm achieves nearly the optimal performance in terms of the number of 
basis functions per atom. 

The behavior of the error found above depends weakly on the number of 
atoms of the quasi-lD system extended along the z direction. The error of the 
total energy per atom for disordered quasi-lD systems of different numbers of 
atoms is shown for Na in Fig. 2] (a) and for Si in Fig. 2] (b), respectively. In 
both cases the buffer size is 0.50, and the penalty parameter is 20. Here 4 and 
6 adaptive local basis functions per atom are used for Na and Si, respectively. 

5.3. Disordered Quasi-2D and 3D Bulk Systems 

This section studies the relation between the error of the total energy per 
atom and the dimensionality of the system. The partition of the domain for 
systems of higher dimension is similar to that in the quasi-lD case. Fig. [5] 
shows the partition of a quasi-2D system with 32 sodium atoms, viewed along 
the X direction. The domain is partitioned into 16 disjoint elements. The 
length of each element (red area) is equal to the length of the lattice constant 
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of the crystalline unit cell. The corresponding extended element for solving the 
adaptive local basis functions includes both the element (red area) and the buffer 
(blue area). Fig. [5] (a) shows the behavior of the error for a disordered quasi-2D 
sodium system containing 32 atoms with the buffer of size 0.50 (black diamond 
with solid line) and of size 1.00 (blue star with dashed line), respectively. For 
the case with the buffer size equal to 0.50, the error of the total energy per atom 
is 1.0 X 10"'^ au using 8 basis functions per atom. The error of the total energy 
per atom can reach 2.8 x 10~^ au with 16 basis functions per atom and buffer 
size 1.00. Fig. [B] (b) shows the behavior of the error for a disordered bulk 3D 
sodium system containing 128 atoms with the buffer of size 0.50 (black diamond 
with solid line) and of size 1.00 (blue star with dashed line), respectively. For 
the case with the buffer size equal to 0.50, the error of the total energy per 
atom is 1.2 x 10"'^ au using 24 basis functions per atom. The error of the total 
energy per atom can reach 5.6 x 10~^ au or 0.15 meV with 42 basis functions 
per atom and buffer size 1.00. Compared to the quasi-lD case, the number of 
adaptive local basis functions per atom increases significantly in order to reach 
the same accuracy. The increasing number of basis functions is partly due to 
the increasing number of Na atoms in the extended element. In this case, the 
numbers of the Na atoms in the extended element with a buffer size of 1.00 are 
4, 18, 54 for quasi-lD, quasi-2D and bulk 3D systems, respectively. The increased 
number of Na atoms in the extended elements requires more eigenfunctions in 
the extended elements, and therefore more adaptive local basis functions per 
atom in the elements. 

5.4- The penalty parameter 

The interior penalty formulation of the discontinuous Galerkin method con- 
tains an important parameter a to guarantee stability, a = 20 has been applied 
uniformly to all the examples studied so far. The a-dependence of the error of 
the total energy per atom is shown for the quasi-lD sodium system in Fig. [7] 
(a) and for the quasi-lD silicon system in Fig. [7](b), respectively. For Na, the 
buffer size is 1.00, and the number of basis functions per atom is 8. The error 
of the total energy per atom is empirically proportional to a^'®^ up to a = 640. 
For Si, the buffer size is 1.00, and the number of basis functions per atom is 6. 
The error of the total energy per atom is empirically proportional to up 
to a = 640. We also remark that the DG formulation can become unstable for 
a smaller than a certain threshold value. For example, the error of the total 
energy per atom is 2.9 x 10^^ au for Na with a = 5, and the error of the total 
energy per atom is 1.7 x 10~^ au for Si with a = 10. Therefore the penalty 
parameter a plays an important role in the stability of the algorithm, but the 
DG scheme can be accurate and stable with respect to a large range of a values. 

5.5. Computational efficiency 

The small number of the adaptive basis functions per atom can lead to sig- 
nificant savings of the computational time. We illustrate the efficiency of our 
algorithm using a disordered bulk 3D sodium system with the buffer size of 1.00 
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and with 16 basis functions per atom. Fig. [S] (b) suggests that the error of the 
total energy per atom is about 10"'^ au for this choice of the parameters. The 
size of each element is equal to the lattice constant with 2 Na atoms in each ele- 
ment. The size of the global domain ft ranges from 4x4x4 unit cells with 128 
Na atoms to 12 x 12 x 12 elements with 3, 456 atoms. The number of processors 
(cores) used is proportional to the number of elements, and 1,728 processors 
are used in the problem with 12x12x12 elements. We compare the wall 
clock time for one step self consistent iteration with 3 LOBPCG iterations for 
solving the adaptive basis functions in the extended elements. Fig. |5] compares 
the wall clock time for solving the DG eigenvalue problem using ScaLAPACK 
function pdsyevd (red triangle with solid line), the time for generating the adap- 
tive local basis functions in the extended elements using LOBPCG solver (blue 
diamond with dashed line), and the time for the overhead in the DG calculation 
(black circle with dot dashed line). The buffer size is 1.00, and the number of 
basis functions per atom is 16. Since both the size of the extended elements 
and the number of basis functions per atom are fixed, the computational time 
for solving the adaptive basis functions does not depend on the global domain 
size. The overhead in the DG calculation method includes mainly the assem- 
bly process of the DG Hamiltonian matrix via numerical integration and data 
communication. All numerical integrations are localized inside each element 
and its neighboring elements. Our implementation ensures that the data com- 
munication is restricted to be within nearest neighboring elements. Therefore 
the time for the overhead increases mildly with respect to the global system 
size. The complexity of the DG eigensolver using pdsyevd scales cubically with 
respect to global system size in the asymptotic limit, and starts to dominate 
the cost of computational time for system containing more than 1,000 atoms. 
Since the number of processors is proportional to the number of elements, the 
asymptotic wall clock time for the DG eigensolver should scales quadratically 
with respect to the number of atoms. The practical wall clock time for solving 
the DG eigensolver is found to be proportional to {NatomY'^'^ (magenta dashed 
line in Fig. indicating that the asymptotic cubic scaling has not yet been 
reached. In the largest example with 3,456 atoms, the matrix size of the DG 
Hamiltonian matrix is 55,296. 

The efficiency due to the dimension reduction of the adaptive basis functions 
can be illustrated by the comparison between the cost of the computational time 
of the LOBPCG eigensolver in the global domain calculation (Global), and 
that of the DG eigenvalue problem with the adaptive basis functions (DG), as 
reported in Table [TJ The global domain calculation uses 10 LOBPCG iteration 
steps per SCF iteration. On a single processor, the global domain calculation 
costs 806 sec for the bulk 3D sodium system with 128 atoms, and 19,112 sec 
for the bulk 3D sodium system with 432 atoms. By assuming that the global 
domain calculation can be ideally parallelized, the third column of Table [T] 
reports the computational time of the global domain calculation measured on a 
single processor divided by the number of processors used in the corresponding 
DG eigensolver. The fourth column reports the wall clock time for the DG 
eigensolver executed in parallel. We remark that the computational time for 



18 



solving the adaptive local basis functions is not taken into account, since we are 
comparing the savings of the computational time due to the dimension reduction 
of the basis functions. It is found that the saving of the computational time is 
already significant even when the system size is relatively small. 



Atom# 


Proc# 


Global (sec) 


DG (sec) 


128 


64 


13 


1 


432 


216 


88 


14 



Table 1: The comparison of the cost of the wall clock time using the LOBPCG 
iteration on the global domain (performed with a single processor and divide 
the time by the number of processors in column 2, assuming that the LOBPCG 
are perfectly parallelized) and the wall clock time using the adaptive local basis 
functions (only count the DG eigenvalue solver using ScaLAPACK with the 
number of processors in column 2). The systems under study are the bulk 3D 
sodium system with 4x4x4 elements (128 Na atoms), and with 6x6x6 
elements (432 Na atoms), respectively. 



6. Discussion and Conclusion 

In this paper we proposed the adaptive local basis functions for discretiz- 
ing the Kohn-Sham Hamiltonian operator, and demonstrated that the adaptive 
local basis functions are efRcient for calculating the total energy and electron 
density, and can reach high accuracy with a very small number of basis func- 
tions per atom. The adaptive local basis functions are discontinuous in the 
global domain, and the continuous Kohn-Sham orbitals and electron density 
are reconstructed from these discontinuous basis functions using the discontinu- 
ous Galerkin (DG) framework. The environmental effect is automatically built 
into the basis functions, thanks to the flexibility provided by the DG framework. 

The current implementation of the DG method is already able to perform 
the total energy calculation for systems consisting of thousands of atoms. The 
performance of the DG method can be improved by taking into account the 
block sparsity of the DG stiffness matrix. Furthermore, the local nature of 
the adaptive basis functions allows us to incorporate the recently developed 



pole expansion and selected inversion type fast algorithms |49l - l52l | into the DG 
framework. The capability of the resulting algorithm is expected to be greatly 
enhanced compared to the current implementation. This is our ongoing work. 

In order to generalize the current framework to the force calculation and 
further to the geometry optimization and the ab initio molecular dynamics 
simulation, the adaptive local basis functions and their derivatives with respect 
to the positions of the atoms (called Pulay force [s^]) should be both accessible. 
Recently we propose the optimized local basis functions [H^l that is able to 
systematically control the magnitude of the Pulay force, which is a further 
improvement of the adaptive local basis functions. This is also our ongoing 
work. 
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Figure 1: (color online) The electron density (a) and the norm of the gradient 
of the electron density (b) on a (001) slice of a mono-crystalline silicon system 
passing through two Si atoms. The two Si atoms are located at (2.57,2.57) au 
and at (7.70, 7.70) au in this plane, respectively. Even in the pseudopotential 
framework, the electron density is much more oscillatory around the nuclei of 
the Si atoms and is smooth in the interstitial region. 
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Figure 2: (color online) A quasi- ID disordered Na system with 8 atoms extended 
along the z direction, viewed along the x direction. The length of each empty 
box is equal to the lattice constant for the perfect Na crystal. The red area 
represents one of the elements E^- The corresponding extended element Q-2, 
consists of both the red area and the blue area (buffer). The buffer size is 0.50 
(in the unit of lattice constant) along the z direction, and is 0.0 along the x and 
y directions. 



21 




Figure 3: (color online) (a) The error of the total energy per atom (the y 
axis, plotted in log-scale) for a disordered quasi-lD sodium system consisting 
of 8 atoms, with respect to the number of adaptive local basis functions per 
atom (the x axis). The buffer sizes are chosen to be 0.50 (black diamond with 
solid line), and 1.00 (blue star with dashed line), (b) The error of the total 
energy per atom (the yaxis, plotted in log-scale) for a disordered quasi-lD silicon 
system consisting of 32 atoms, with respect to the number of adaptive local basis 
functions per atom (the x axis). The legend is the same as in (a). 
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Figure 4: (color online) (a) The error of the total energy per atom (the y axis) 
for disordered quasi-lD sodium systems of different numbers of atoms (the x 
axis) extended along the z direction. The buffer size is 0.50, and 4 adaptive 
local basis functions per atom arc used in each calculation, (b) The error of the 
total energy per atom for the disordered quasi-lD silicon systems of different 
numbers of atoms (the x axis) extended along the z direction. The buffer size is 
0.50, and 6 adaptive local basis functions per atom are used in each calculation. 
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Figure 5: (color online) A quasi-2D disordered Na system with 32 atoms ex- 
tended along the y and the z directions, viewed along the x direction. The 
red area represents one of the elements £'2,2, and the corresponding extended 
element Q2,2 consists of both the red area and the blue area (buffer). The buffer 
size is 0.50 (in the unit of lattice constant) along the y and the z directions, and 
is 0.0 along the x direction. 
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Figure 6: (color online) (a) The error of the total energy per atom (the y axis, 
plotted in log-scale) for a disordered quasi-2D sodium system containing 32 
atoms, with respect to the number of basis functions per atom (the x axis). 
The buffer size is chosen to be 0.50 (black diamond with solid line), and 1.00 
(blue star with dashed line), respectively, (b) The error of the total energy per 
atom for a disordered bulk 3D sodium system (the y axis, plotted in log-scale) 
containing 128 atoms, with respect to the number of basis functions per atom 
(the X axis). The buffer size is chosen to be 0.50 (black diamond with solid 
line), and 1.00 (blue star with dashed line), respectively. 
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Figure 7: (color online) (a) Log-log plot for the error of the total energy per 
atom (the y axis) with respect to the penalty parameter a (the x axis), for a 
quasi-lD sodium system with 8 atoms. The buffer size is f .00 and the number 
of basis functions per atom is 12. The error (black diamond with solid line) 
can be fitted with a polynomial function of a (blue dashed line), (b) Log- log 
plot for the error of the total energy per atom (the y axis) with respect to the 
penalty parameter a (the x axis), for a quasi-lD silicon system with 32 atoms. 
The buffer size is 1.00 and the number of basis functions per atom is 6. The 
error (black diamond with solid line) can be fitted with a polynomial function 
of a (blue dashed line). 



26 




128 432 1024 3456 

Atom # 



Figure 8: (color online) Log-log plot for the wall clock time {y axis) for solving 
disordered bulk 3D sodium systems of different sizes (x axis) with one step self- 
consistent field iteration. The number of processors is chosen to be proportional 
to the number of atoms, with 1,728 processors used for the largest problem 
solved here (3, 456 Na atoms). The total wall clock time is broken down into the 
time for solving the DG eigenvalue problem using ScaLAPACK function pdsyevd 
(red triangle with solid line), the time for generating the adaptive local basis 
functions in the extended elements using LOBPCG solver (blue diamond with 
dashed line), and the time for the overhead in the DG calculation, including 
the matrix assembly and data communication (black circle with dot dashed 
line). The buffer size is 1.00, and the number of basis functions per atom is 
16. The scaling of the wall clock time for solving the DG eigenvalue problem 
using pdsyevd with respect to the number of atoms is illustrated by the magenta 
dashed line. 
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